{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Chapter 5 Solutions"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Question 1\n",
    "Firstly, let's rewrite $f(x)$ as $f(x) = 4\\log (x)\\sin(x^3)$.\n",
    "\n",
    "Then, $f'(x) = \\frac4x\\sin(x^3)+12x^2\\log(x)\\cos(x^3)$.\n",
    "\n",
    "---"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Question 2\n",
    "If we rewrite our function as $f(x) = (1+\\exp(-x))^{-1}$, then we have $f'(x) = \\frac{\\exp(-x)}{(1+\\exp(-x))^2}$.\n",
    "\n",
    "---"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Question 3\n",
    "We have $f'(x) = \\frac{\\mu-x}{\\sigma^2} f(x)$.\n",
    "\n",
    "---"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Question 4\n",
    "We compute the first five derivatives of our function at 0. We have $f(0)=f'(0)=1$, $f^{(2)}(0)=f^{(3)}(0)=-1$, and $f^{(4)}(0)=f^{(5)}(0)=1$.\n",
    "\n",
    "The Taylor polynomial $T_5(x) = 1+x-\\frac12 x^2 -\\frac16 x^3 + \\frac1{24} x^4 + \\frac1{120}x^5$. The lower-order Taylor polynomials can be found by truncating this expression appropriately.\n",
    "\n",
    "---"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Question 5\n",
    "### Part a\n",
    "We can see below that $\\frac{\\partial f_1}{\\partial x}$ has dimension $1\\times2$; $\\frac{\\partial f_2}{\\partial x}$ has dimension $1\\times n$; and $\\frac{\\partial f_3}{\\partial x}$ has dimension $n^2\\times n$.\n",
    "\n",
    "### Part b\n",
    "We have $\\frac{\\partial f_1}{\\partial x} = \\begin{bmatrix}\n",
    "\\cos(x_1)\\cos(x_2)&-\\sin(x_1)\\sin(x_2)\n",
    "\\end{bmatrix}$;\n",
    "$\\frac{\\partial f_2}{\\partial x} = y^{\\mathsf{T}}$. (Remember, $y$ is a column vector!)\n",
    "\n",
    "### Part c\n",
    "Note that $xx^{\\mathsf{T}}$ is the matrix $\\begin{bmatrix}\n",
    "x_1^2 & x_1x_2 & \\cdots & x_1x_n \\\\\n",
    "x_2x_1 & x_2^2 & \\cdots & x_2x_n \\\\\n",
    "\\vdots & \\vdots & \\ddots & \\vdots \\\\\n",
    "x_nx_1 & x_nx_2 & \\cdots & x_n^2\n",
    "\\end{bmatrix}$. Thus its derivative will be a higher-order tensor. However, if we consider the matrix to be an $n^2$-dimensional object in its own right, we can compute the Jacobian. Its first row consists of $\\begin{bmatrix}\n",
    "2x_1&x_2&\\cdots&x_n~|~x_2&0&\\cdots&0~|~\\cdots~|x_n&0&\\cdots&0\n",
    "\\end{bmatrix}\n",
    "$, where I have inserted a vertical bar every $n$ columns, to aid readability.\n",
    "\n",
    "---"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Question 6\n",
    "We have $\\frac{df}{dt} = \\cos(\\log(t^{\\mathsf{T}}t))\\cdot\\frac1{t^{\\mathsf{T}}t}\\cdot \\begin{bmatrix}\n",
    "2t_1&2t_2&\\cdots&2t_D\n",
    "\\end{bmatrix}\n",
    "= \\cos(\\log(t^{\\mathsf{T}}t))\\cdot\\frac{2t^{\\mathsf{T}}}{t^{\\mathsf{T}}t}\n",
    "$.\n",
    "\n",
    "For $g$, if we explicitly compute $AXB$ and find its trace, we have that $g(X) = \\sum_{k=1}^D \\sum_{j=1}^F \\sum_{i=1}^E a_{ki}x_{ij}b_{jk}$. Thus we have, $\\frac{\\partial g}{\\partial x_{ij}} = \\sum_{k=1}^D b_{jk}a_{ki}$, and this is the $(i,j)$-th entry of the required derivative. Hence $\\frac{dg}{dX} = B^{\\mathsf{T}}A^{\\mathsf{T}}$.\n",
    "\n",
    "---"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Question 7\n",
    "### Part a\n",
    "The chain rule tells us that $\\frac{df}{dx} = \\frac{df}{dz}\\frac{dz}{dx}$, where $\\frac{df}{dz}$ has dimension $1\\times 1$, and $\\frac{dz}{dx}$ has dimension $1\\times D$. We know $\\frac{dz}{dx}=2x^{\\mathsf{T}}$ from $f$ in Question 6. Also, $\\frac{df}{dz} = \\frac{1}{1+z}$.\n",
    "\n",
    "Therefore, $\\frac{df}{dx} = \\frac{2x^{\\mathsf{T}}}{1+x^{\\mathsf{T}}x}$.\n",
    "\n",
    "### Part b\n",
    "Here we have $\\frac{df}{dz}$ is an $E\\times E$ matrix, namely $\\begin{bmatrix}\n",
    "\\cos z_1 & 0 &  \\cdots & 0\\\\\n",
    "0 & \\cos z_2 &  \\cdots & 0\\\\\n",
    "\\vdots & \\vdots & \\ddots & \\vdots \\\\\n",
    "0 &  0&\\cdots&\\cos z_E\n",
    "\\end{bmatrix}$.\n",
    "\n",
    "Also, $\\frac{dz}{dx}$ is an $E\\times D$-dimensional matrix, namely $A$ itself.\n",
    "\n",
    "The overall derivative is obtained by multiplying these two matrices together, which will again give us an $E\\times D$-dimensional matrix.\n",
    "\n",
    "---"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Question 8\n",
    "### Part a\n",
    "We have $\\frac{df}{dz}$ has dimension $1\\times 1$, and is simply $-\\frac12 \\exp(-\\frac12 z)$.\n",
    "\n",
    "Now, $\\frac{dz}{dy}$ has dimension $1\\times D$, and is given by $y^{\\mathsf{T}}(S^{-1}+(S^{-1})^{\\mathsf{T}})$.\n",
    "\n",
    "Finally, $\\frac{dy}{dx}$ has dimension $D\\times D$, and is just the identity matrix.\n",
    "\n",
    "Again, we multiply these all together to get our final derivative.\n",
    "\n",
    "### Part b\n",
    "If we explicitly write out $xx^{\\mathsf{T}}+\\sigma^2I$, and compute its trace, we find that $f(x) = x_1^2 + \\dots + x_n^2 + n\\sigma^2$.\n",
    "\n",
    "Hence, $\\frac{df}{dx} = 2x^{\\mathsf{T}}$.\n",
    "\n",
    "### Part c\n",
    "Here, $\\frac{df}{dz} = \\begin{bmatrix}\n",
    "\\frac{1}{\\cosh^2z_1}&0&\\cdots&0\\\\\n",
    "0&\\frac{1}{\\cosh^2z_2}&\\cdots&0\\\\\n",
    "\\vdots & \\vdots&\\ddots&\\vdots\\\\\n",
    "0&0&\\cdots&\\frac{1}{\\cosh^2z_M}\n",
    "\\end{bmatrix}$, while $\\frac{dz}{dx} = A$, as in Question 7b.\n",
    "\n",
    "Finally, $\\frac{df}{dx}$ is given by the product of these two matrices.\n",
    "\n",
    "---"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Question 9\n",
    "Piecing this together, replacing $z$ with $t(\\epsilon,\\nu)$ throughout, we have that $g(\\nu) = \\log(p(x,t(\\epsilon,\\nu))) - \\log(q(t(\\epsilon,\\nu),\\nu))$.\n",
    "\n",
    "Therefore, $\\frac{dg}{d\\nu} = \\frac{p'(x,t(\\epsilon,\\nu))\\cdot t'(\\epsilon,\\nu) }{p(x,t(\\epsilon,\\nu))} - \\frac{q'(t(\\epsilon,\\nu),\\nu)\\cdot t'(\\epsilon,\\nu)}{q(t(\\epsilon,\\nu),\\nu)}\n",
    "$."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
